home *** CD-ROM | disk | FTP | other *** search
File List | 1986-01-18 | 8.4 KB | 219 lines |
- REFERENCE:
-
- FFT32010 is a collect of 9 FFT programs written in the TMS32010 source code.
- These programs and the following writeup can be found in the Chapter 5 of
- "DFT/FFT and Convolution Algorithms - Theory and Implementation," C.S. Burrus
- and T.W. Parks, Wiley, 1985.
-
-
-
-
- TABLE 5-1. TMS32010 ASSEMBLY LANGUAGE PROGRAM INDEX
-
-
- _____________________________________________________________________
- | PROGRAM DESCRIPTION PAGE |
- |-------------------------------------------------------------------|
- | 1 Direct DFT with table lookup 148 |
- | 2 Second-order Goertzel with forward-order input 152 |
- | 3 Basic one-butterfly Cooley-Tukey radix-2 FFT 156 |
- | 4 Basic one-butterfly radix-4 FFT 161 |
- | 5 Three-butterfly radix-4 FFT with reduced storage 168 |
- | 6 Straight-line 64-point radix-4 FFT 179 |
- | 7 Two-butterfly radix-8 FFT 190 |
- | 8 Prime-factor algorithm FFT with unscrambling 205 |
- | 9 Direct convolution routine 220 |
- ---------------------------------------------------------------------
-
-
-
-
- Chapter 5. TMS32010 ASSEMBLY LANGUAGE PROGRAMS
- FOR THE DFT AND CONVOLUTION
-
-
- 5.1 INTRODUCTION
-
- The TMS32010 assembly language programs in this chapter are
- based on the FORTRAN routines from Chapter 4. As much as
- possible, these programs retain the program structure and
- variable names of their FORTRAN counterparts. Many of the
- comments in these programs are lines of FORTRAN from the
- prototype routine which help explain the assembly language.
-
- The FORTRAN programs have been followed as much as possible
- to retain clarity and consistency. At the same time, some
- assumptions have been made to take advantage of the features
- of the TMS32010 instruction set. First, many of the
- programs assume a minimal amount of external memory used to
- hold the data which is addressed via peripheral I/O
- instructions, i.e., an address is initially sent to a data
- address counter and then the data word read from or written
- to the memory. A counter is used to address the data
- locations sequentially since complex data is used whose
- structure consists of the real and imaginary parts of a
- number in consecutive memory locations. A Q15 format for
- number representation is assumed for all data and
- coefficient values. This format (one sign bit + 15
- fractional bits; the absolute value of all represented
- numbers is less than one) simplifies calculations in
- fixed-point machines, such as the TMS32010 digital signal
- processor. No scaling on intermediate values in the
- calculation was done, thus helping to clarify the code. For
- most data input sequences, scaling is unnecessary since they
- will not produce accumulator overflow in the processor.
- Table lookup of coefficients is used for speed and
- simplicity. These programs use a full size table (often the
- size of the transform). Smaller tables may be used at the
- expense of speed if program memory size is limited. All
- programs have been assembled using a Texas Instruments
- cross-assembler running on a VAX 11-780 with the VMS
- operating system.
-
-
-
- 5.2 TMS32010 PROGRAMS
-
- Program 1 is a direct DFT similar to the first FORTRAN
- program. All data is initially in on-chip data RAM, and
- results are output sequentially. The data is on page 0 and
- is addressed indirectly with the auxiliary registers. All
- other variables are located on page 1 of the TMS32010. Full
- 32-bit intermediate values are retained for maximum
- accuracy. Note that the 32-bit running sum is initialized
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Page 2
-
-
- by shifting the first data point 15 bits which simulates a
- multiply by one in Q15 format. In this and all the other
- assembly language programs, arrays and loop counters start
- at zero, unlike the FORTRAN routines.
-
- Program 2 is a TMS32010 implementation of a second-order
- Goertzel's algorithm with forward-order inputs similar to
- FORTRAN Program 4. Like the DFT program, input data is
- initially on page 0 of data RAM and is addressed indirectly.
- Results are output sequentially. The 2*cosine factor in the
- inner loop is implemented by adding twice the product of a
- single cosine multiply. This 2*cosine term along with the
- feedback loop make this algorithm very susceptible to
- quantization error and overflows on a fixed-point machine.
- Since no scaling is done in this algorithm, the data should
- be scaled previously to guard against overflow.
-
- A single butterfly radix-2 Cooley-Tukey FFT, similar to
- FORTRAN Program 6, is shown in Program 3. All data is in
- external data memory using hardware described in the
- introduction of this chapter. Because the real and
- imaginary parts of the complex input data are in sequential
- locations, not in separate arrays, the data index I has a
- value twice that in the corresponding FORTRAN routine. The
- variable IE, the index increment for the coefficients, is
- computed with multiplies by 2 (implemented with a shift) in
- order to eliminate divisions which are inefficient. The
- size of the transform is limited to a power of 2, and the
- largest transform is simply a function of the amount of
- available program memory. Program 4 is just like Program 3
- but with a single radix-4 butterfly. The transform size is
- limited to a power of 4.
-
- Program 5 is a radix-4 algorithm with three butterflies,
- similar to FORTRAN Program 13. Both auxiliary registers are
- used as loop counters in the program. Auxiliary Register 1
- (AR1) replaces K in the DO 10 loop, and Auxiliary Register 0
- (AR0) is used in the DO 1 and DO 20 loops. These registers
- are efficient as counters in code with a lot of loops.
-
- Program 6 is a 64-point complex FFT optimized for speed.
- The algorithm is a three-butterfly radix-4 with all
- straight-line code. This program uses the macro capability
- of the assembler to generate the code, which is almost 2800
- lines long when expanded. Three macros are used, one for
- each type of butterfly. All data begins and ends on page 0
- of data RAM, and page 1 contains two temporary locations
- which are addressed by auxiliary registers. The 13-bit
- immediate coefficients are used as part of the MPYK
- (multilpy immediate) instruction. Butterflies are accessed
- in the same order as in Program 5, and all data points are
- addressed directly for speed.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Page 3
-
-
- A radix-8 FFT is shown in Program 7. This is a
- two-butterfly FFT and corresponds to Program 16 in FORTRAN.
- Because this a radix-8 program, there are half as many
- external data accesses as for the looped radix-4. This
- helps to speed up the program. The tradeoff is in a more
- complex program and fewer transform sizes which can be
- computed.
-
- Program 8 is a prime-factor FFT, corresponding to FORTRAN
- Program 17. This program is specific for a length-63 PFA,
- but the structure can be used for all module and transform
- sizes. Coefficients are in a table initially and then read
- into data RAM for use during execution. Because the
- unscrambling is not an in-place algorithm, the final
- unscrambled results are written to port 2, which corresponds
- to arrays A and B in the FORTRAN program.
-
- A direct calculation of linear convolution is shown in
- Program 9. In this example, an infinite input sequence is
- convolved with a 32-bit permanent sequence which represents
- the impulse response of a bandpass filter. Input is read
- sequentially from port 0, and output is written to port 1.
- The main computation in the convolution is done using the
- pair of instructions, LTD and MPY, which multiplies the
- sequences point by point and shifts the input sequence for
- the next loop.
-
-
-
-
-
-
- REFERENCE
-
- 1. Texas Instruments, TMS32010 USER'S GUIDE. Houston, TX:
- Texas Instruments, 1983.
-
-
-
-